-
Notifications
You must be signed in to change notification settings - Fork 34
Preserve legacy Xarray merge settings #1016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
09da560 to
33dc8b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR preserves legacy Xarray merge behavior in response to breaking changes introduced in newer versions of Xarray. The changes ensure backward compatibility by explicitly setting compat="no_conflicts" and join="outer" when merging datasets, which were the default behaviors before Xarray v2025.08.0.
Key Changes:
- Introduced a global constant
LEGACY_XARRAY_MERGE_KWARGSto centralize merge configuration - Applied legacy merge settings to
xr.merge()calls in dataset utilities - Applied legacy merge settings to
xr.open_mfdataset()in the new_open_mfdataset()helper function - Refactored duplicate code in
mp_partition_driver.pyby extracting a reusable_open_mfdataset()function
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
e3sm_diags/__init__.py |
Defines LEGACY_XARRAY_MERGE_KWARGS constant with legacy Xarray merge settings |
e3sm_diags/driver/utils/dataset_xr.py |
Applies legacy merge kwargs to two xr.merge() calls |
e3sm_diags/driver/tropical_subseasonal_driver.py |
Adds join="outer" to existing xr.merge() call already using compat="override" |
e3sm_diags/driver/mp_partition_driver.py |
Extracts duplicated code into _open_mfdataset() helper and applies legacy merge settings; improves docstring formatting |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tomvothecoder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mahf708 and @chengzhuzhang, this PR is ready for review. It sets legacy Xarray settings on remaining xr.open_mfdataset() and xr.merge() calls to prevent a FutureWarning about changing kwargs.
@mahf708 Can you try pip installing the build from this branch in your env and see if the FutureWarnings are now gone?
| try: | ||
| # xr.open_mfdataset() can accept an explicit list of files. | ||
| landfrac = xr.open_mfdataset(glob.glob(f"{test_data_path}/LANDFRAC_*")).sel( | ||
| lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31") | ||
| )["LANDFRAC"] | ||
| temp = xr.open_mfdataset(glob.glob(f"{test_data_path}/T_*.nc")).sel( | ||
| lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31") | ||
| )["T"] | ||
| cice = xr.open_mfdataset(glob.glob(f"{test_data_path}/CLDICE_*.nc")).sel( | ||
| lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31") | ||
| )["CLDICE"] | ||
| cliq = xr.open_mfdataset(glob.glob(f"{test_data_path}/CLDLIQ_*.nc")).sel( | ||
| lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31") | ||
| )["CLDLIQ"] | ||
| landfrac = _open_mfdataset(test_data_path, "LANDFRAC", start_year, end_year) | ||
| temp = _open_mfdataset(test_data_path, "T", start_year, end_year) | ||
| cice = _open_mfdataset(test_data_path, "CLDICE", start_year, end_year) | ||
| cliq = _open_mfdataset(test_data_path, "CLDLIQ", start_year, end_year) | ||
| except OSError: | ||
| logger.info( | ||
| f"No files to open for variables within {start_year} and {end_year} from {test_data_path}." | ||
| ) | ||
| raise | ||
|
|
||
| parameter.test_name_yrs = test_data.get_name_yrs_attr(season) | ||
|
|
||
| metrics_dict["test"] = {} | ||
| metrics_dict["test"]["T"], metrics_dict["test"]["LCF"] = compute_lcf( | ||
| cice, cliq, temp, landfrac | ||
| ) | ||
|
|
||
| if run_type == "model-vs-model": | ||
| ref_data = Dataset(parameter, data_type="ref") | ||
|
|
||
| ref_data_path = parameter.reference_data_path | ||
| start_year = parameter.ref_start_yr | ||
| end_year = parameter.ref_end_yr | ||
| # xr.open_mfdataset() can accept an explicit list of files. | ||
|
|
||
| try: | ||
| landfrac = xr.open_mfdataset(glob.glob(f"{ref_data_path}/LANDFRAC_*")).sel( | ||
| lat=slice(-70, -30), | ||
| time=slice(f"{start_year}-01-01", f"{end_year}-12-31"), | ||
| )["LANDFRAC"] | ||
| temp = xr.open_mfdataset(glob.glob(f"{ref_data_path}/T_*.nc")).sel( | ||
| lat=slice(-70, -30), | ||
| time=slice(f"{start_year}-01-01", f"{end_year}-12-31"), | ||
| )["T"] | ||
| cice = xr.open_mfdataset(glob.glob(f"{ref_data_path}/CLDICE_*.nc")).sel( | ||
| lat=slice(-70, -30), | ||
| time=slice(f"{start_year}-01-01", f"{end_year}-12-31"), | ||
| )["CLDICE"] | ||
| cliq = xr.open_mfdataset(glob.glob(f"{ref_data_path}/CLDLIQ_*.nc")).sel( | ||
| lat=slice(-70, -30), | ||
| time=slice(f"{start_year}-01-01", f"{end_year}-12-31"), | ||
| )["CLDLIQ"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted common code into an _open_mfdataset() function at the bottom of this file to make the code DRY.
| def _open_mfdataset( | ||
| data_path: str, var: str, start_year: int, end_year: int | ||
| ) -> xr.DataArray: | ||
| """ | ||
| Open multiple NetCDF files as a single xarray Dataset and subset by time | ||
| and latitude. | ||
| This function reads multiple NetCDF files matching the specified variable | ||
| name and combines them into a single xarray Dataset. The data is then | ||
| subsetted based on the specified time range and latitude bounds. | ||
| Parameters | ||
| ---------- | ||
| data_path : str | ||
| The path to the directory containing the NetCDF files. | ||
| var : str | ||
| The variable name to match in the file pattern. | ||
| start_year : int | ||
| The starting year for the time subsetting. | ||
| end_year : int | ||
| The ending year for the time subsetting. | ||
| Returns | ||
| ------- | ||
| xr.DataArray | ||
| The subsetted DataArray for the specified variable, filtered by time | ||
| and latitude. | ||
| """ | ||
| file_pattern = f"{data_path}/{var}_*.nc" | ||
| ds = xr.open_mfdataset( | ||
| glob.glob(file_pattern), | ||
| data_vars="minimal", | ||
| **LEGACY_XARRAY_MERGE_KWARGS, # type: ignore[ arg-type ] | ||
| ) | ||
|
|
||
| ds_sub = ds.sel( | ||
| lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31") | ||
| ) | ||
|
|
||
| da_var = ds_sub[var] | ||
|
|
||
| return da_var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DRY function.
| # Settings to preserve legacy Xarray behavior when merging datasets. | ||
| # See https://xarray.pydata.org/en/stable/user-guide/io.html#combining-multiple-files | ||
| # and https://xarray.pydata.org/en/stable/whats-new.html#id14 | ||
| LEGACY_XARRAY_MERGE_KWARGS = { | ||
| # "override", "exact" are the new defaults as of Xarray v2025.08.0 | ||
| "compat": "no_conflicts", | ||
| "join": "outer", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant variable for legacy xarray merge settings.
|
@tomvothecoder The logs got messy after this xarray update. Thanks for working on this PR to fix this. The code change looks good to me. |
Description
Closes #1012
This PR preserves legacy Xarray merge behavior in response to breaking changes introduced in newer versions of Xarray. The changes ensure backward compatibility by explicitly setting
compat="no_conflicts"andjoin="outer"when merging datasets, which were the default behaviors before Xarray v2025.08.0.Key Changes:
LEGACY_XARRAY_MERGE_KWARGSto centralize merge configurationxr.merge()calls in dataset utilitiesmp_partition_driver.pyby extracting a reusable_open_mfdataset()functionxr.open_mfdataset()in the new_open_mfdataset()helper functionChecklist
If applicable: